Robust Mean Estimation in High Dimensions: An Outlier-Fraction Agnostic and Efficient Algorithm

نویسندگان

چکیده

The problem of robust mean estimation in high dimensions is studied, which a certain fraction (less than half) the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, formulated as minimization ℓ 0 -‘norm’ an outlier indicator vector , under second moment constraint on datapoints. then relaxed to xmlns:xlink="http://www.w3.org/1999/xlink">p -norm (0 < xmlns:xlink="http://www.w3.org/1999/xlink">p ≤ 1) objective, and it shown that global minima for each these objectives are order-optimal have optimal breakdown point problem. Furthermore, computationally tractable iterative -minimization hard thresholding algorithm proposed outputs estimate population mean. (with ≈ 0.3) does not require prior knowledge outliers, contrast with most existing algorithms, = 1 has near-linear time complexity. Both synthetic real data experiments demonstrate outperforms state-of-the-art methods.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computationally Efficient Robust Sparse Estimation in High Dimensions

Many conventional statistical procedures are extremely sensitive to seemingly minor deviations from modeling assumptions. This problem is exacerbated in modern high-dimensional settings, where the problem dimension can grow with and possibly exceed the sample size. We consider the problem of robust estimation of sparse functionals, and provide a computationally and statistically efficient algor...

متن کامل

MONK - Outlier-Robust Mean Embedding Estimation by Median-of-Means

Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map defined by a kernel. As a mean, its classical e...

متن کامل

Robust Sparse Estimation Tasks in High Dimensions

In this paper we initiate the study of whether or not sparse estimation tasks can be performed efficiently in high dimensions, in the robust setting where an ε-fraction of samples are corrupted adversarially. We study the natural robust version of two classical sparse estimation problems, namely, sparse mean estimation and sparse PCA in the spiked covariance model. For both of these problems, w...

متن کامل

Outlier identification in high dimensions

A computationally fast procedure for identifying outliers is presented, that is particularly effective in high dimensions. This algorithm utilizes simple properties of principal components to identify outliers in the transformed space, leading to significant computational advantages for high dimensional data. This approach requires considerably less computational time than existing methods for ...

متن کامل

Global High Dimension Outlier Algorithm for Efficient Clustering and Outlier Detection

In this digital era most of the knowledge kinded on the market in digital form. For several years, individuals have command the hypothesis that exploitation phrases for square measure presentation of document and topic ought to perform higher than terms. During this paper we have a tendency to square measure examine and investigate this reality with considering many states of art data processin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Information Theory

سال: 2023

ISSN: ['0018-9448', '1557-9654']

DOI: https://doi.org/10.1109/tit.2023.3249197